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Georgia Institute of Technology 

I want to start by congratulating Professors Chandrasekaran, Parrilo and 
Willsky for this fine piece of work. Their paper, hereafter referred to as 
CPW, addresses one of the biggest practical challenges of Gaussian graph- 
ical models — how to make inferences for a graphical model in the presence 
of missing variables. The difficulty comes from the fact that the validity 
of conditional independence relationships implied by a graphical model re- 
lies critically on the assumption that all conditional variables are observed, 
which of course can be unrealistic. As CPW shows, this is not as hope- 
less as it might appear to be. They characterize conditions under which a 
conditional graphical model can be identified, and offer a penalized likeli- 
hood method to reconstruct it. CPW notes that with missing variables, the 
concentration matrix of the observables can be expressed as the difference 
between a sparse matrix and a low-rank matrix; and suggests to exploit the 
sparsity using an l\ penalty and the low-rank structure by a trace norm 
penalty. In particular, the trace norm penalty or, more generally, nuclear 
norm penalties, can be viewed as a convex relaxation to the more direct 
rank constraint. Its use oftentimes comes as a necessity because rank con- 
strained optimization could be computationally prohibitive. Interestingly, as 
I note here, the current problem actually lends itself to efficient algorithms 
in dealing with the rank constraint, and therefore allows for an attractive 
alternative to the approach of CPW. 

1. Rank constrained latent variable graphical Lasso. Recall that the pe- 
nalized likelihood estimate of CPW is defined as 

(S n ,L n )= argmin {—£(S — L, Tip) + \ n {l\\S\\i + trace(L))}, 

LtO,S-L>-0 

where the vector i\ norm and trace/nuclear norm penalties are designated 
to induce sparsity among elements of S and low-rank structure of L respec- 
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tively. Of course, we can attempt a more direct rank penalty as opposed to 
the nuclear norm penalty on L, leading to 

(S n ,L n )= argmin {-£(S - L, Eg) + X n (l\\S\\i + rank(L))}; 

LhO,S-L^O 

or for computational purposes, it is more convenient to consider the con- 
strained version: 

(S n ,L n )= argmin {-£(S - L, E&) + XJ^h}, 

L>0,S-LyO 
rank(L)<r 

for some integer < r < p, where = S — diag(5), that is, S 1 ' equals S 
except that its diagonals are replaced by 0. This slight modification re- 
flects our intention to encourage sparsity on the off-diagonal entries of S 
only. The remaining discussion, however, can be easily adapted to deal with 
the original vector t\ penalty on S. It is clear that when r = 0, that is, 
L = 0, this new estimator reduces to the so-called graphical Lasso estimate 
(glasso, for short) of Yuan and Lin (2007). See also Banerjee, El Ghaoui and 
d'Aspremont (2008), Friedman, Hastie and Tibshirani (2008), and Rothman 
et al. (2008). Drawn to this similarity, I shall hereafter refer to this method 
as the latent variable graphical Lasso, or LVglasso, for short. 

Common wisdom on (S n , L n ) is that it is infeasible to compute because of 
the nonconvexity of the rank constraint. Interestingly, though, this more di- 
rect approach actually allows for fast computation, thanks to a combination 
of EM algorithm and some recent advances in computing graphical Lasso 
estimates for high-dimensional problems. 

2. An EM algorithm. The constraint rank(L) < r amounts to postu- 
lating r latent variables. The latent variable model naturally has a miss- 
ing data formulation. It is clear that when observing the complete data 
X = (Xq,X]j) t , the LVglasso estimator becomes 

K\ = argmin {L{K) + Apf£||i}, 

where 

L(K) = - lndet(iir) + trace(E^ OH) i^) 

and Y>™ OH -j is the sample covariance matrix of the full data. Now that Xh 
is unobservable, we can use an EM algorithm which iteratively applies the 
following two steps: 

Expectation step (E step). Calculate the expected value of the pe- 
nalized negative log-likelihood function, with respect to the conditional dis- 
tribution of Xh given Xq under the current estimate of K, leading to 
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the so-called Q function: 

Q{K\K®) = E Xh \ XoiKW [L(K) + XWK^M 

= -lndet(K) +trace{E X// | Xo K{t ) 
Recall that Xjj\Xo,K^ follows a normal distribution with 

E(x H \x ,KV) = ^ (i;<§r 1 Xo 

and 

Vax(X H \X ,K®) = E§» " ^Soi^S)' 1 ^ 
where = (K®)- 1 . Therefore, 

^X h \X ,KW\^OH) — ^OK^O ) ^oH 

and 

it? /v n — vw vW ^v(*)"\ — 1 v(*) _l v(*) /'v(*) N i — 1 v n /"v^A - M f ) 

Maximization step (M step). Maximize Q(-\K®) over all (p + r) x 
(p + r) positive definite matrices. We first note that if we replace the penalty 
term H-fT^Hi with then maximizing Q(-\K becomes a glasso prob- 

lem: 

max {- lndet(K) + trace{WK} + A||^ + ||i}, 

where W = ¥, Xh \ Xq K (t) (£™ ^). As shown in Banerjee, El Ghaoui and 
d'Aspremont (2008), Friedman, Hastie and Tibshirani (2008) and Yuan (2008), 
this problem can be solved iteratively. At each iteration, one row and, cor- 
respondingly, one column of K, due to symmetry, are updated by solving a 
Lasso problem. The same idea can be applied here to maximize Q(-\K^). 
The only difference is that in each of the Lasso problems, we leave the coor- 
dinates corresponding to the latent variables unpenalized. This extension has 
been implemented in the R package glasso [Friedman, Hastie and Tibshirani 
(2008)]. 

3. Example. For illustration purposes, I conducted a simple numerical 
experiment. In this experiment the interest was in recovering a p = 198 
dimensional graphical model with h = 2 missing variables. The graphical 
model was generated in a similar fashion as that from Meinshausen and 
Biihlmann (2006). I first simulated 198 locations uniformly over a square. 
Between each pair of locations, I put an edge with probability 2<fi(d^p), 
where </>(•) is the density function of the standard normal distribution and 
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True Conditional Graph Estimated Conditional Graph (glasso) 




Estimated Conditional Graph (LVglasso) Estimated Conditional Graph (CPW) 




Fig. 1. True graphical model and its estimates. 



d is the distance between the two locations, unless one of the locations is 
already connected with four other locations. The two hidden variables were 
connected with all p observed variables. The entries of the inverse covari- 
ance matrix corresponding to the edges between the observables were as- 
signed with value 0.2, between the observables and the latent variables were 
assigned with a uniform random value between and 0.12, to ensure the 
positive definiteness. A typical simulated graphical model among the 198 
observed variables conditional on the two latent variables is given in the top 
left panel of Figure 1. We apply both the method of CPW and LVglasso, 
along with glasso, to the data. We used the MATLAB code provided by CPW 
to compute their estimates. As observed by CPW, their estimate typically 
is insensitive to a wide range of values of 7, and we report here the results 
with the default choice of 7 = 5 without loss of generality. Similarly, for 
LVglasso, little variation was observed for r = 2, . . . , 10, and we shall focus 
on r = 2 for brevity. The choice of A plays a critical role for both meth- 
ods. We compute both estimators for a fine grid of A. With the main focus 
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Fig. 2. Accuracy of reconstructed conditional graphical model. 

on recovering the conditional graphical model, that is, the sparsity pattern 
of S, we report in Figure 2 the ROC curve for both methods. For contrast, 
we also reported the result for glasso which neglects the missingness. In 
Figure 1, we also presented the estimated graphical model for each method 
that is closest to the truth. These results clearly demonstrate the necessity 
of accounting for the latent variables. It is also interesting to note that the 
rank constrained estimator performs slightly better in this example over the 
trace norm penalization method of CPW. 

The preliminary results presented here suggest that direct rank constraint 
may provide a competitive alternative to the trace norm penalization for re- 
covering graphical models with latent variables. It is of interest to investigate 
more rigorously how the two methods compare with each other. 
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